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PART  A  -  DIRECTOR’S  OVERVIEW 


JSEP  continues  to  play  an  important  role  in  the  electronics  research  at  the  University  of 
California,  Berkeley.  Its  emphasis  on  science  and  relatively  stable  funding  provide  an  increas¬ 
ingly  rare  environment  for  conducting  the  more  basic  research  and  exploring  promising  new 
areas.  It  also  provides  a  unique  opportunity  for  encouraging  collaborative  research  involving 
multiple  principal  investigators  or  new  faculty  members.  Currently,  JSEP  at  U.C.  Berkeley 
partially  supports  the  research  of  10  faculty  and  23  graduate  students. 

Progress  is  reported  here  for  nine  projects  under  three  themes.  Under  the  theme.  Quan¬ 
tum  Electronic  Devices,  nonlinear  optics  in  compound  semiconductor  waveguides  are  investi¬ 
gated  for  such  applications  as  signal  correlator  and  spectrum  analyzer,  ultrafact  optical  pulse 
techniques  are  developed,  and  electron-optic  and  photoelastic  effects  are  applied  to  probe  sem¬ 
iconductor  devices  in  interfaces.  The  theme  of  Electronic  Devices  includes  three  work  units. 
The  0.1  pm  Bulk  and  SOI  Devices  unit  follows  and  extends  the  0.25  pm  devices  project  to 
explore  the  new  device  physic  and  limitation  in  future  IC  devices.  Ferroelectrics  and  conduc¬ 
tive  oxides  are  being  studied  as  breakthrough  materials  for  memory  devices.  Insulated-gate 
interfaces  and  dielectrics  in  GaAs  FET  are  investigated  for  understanding  the  limitations  and 
searching  for  a  practical  device  structure.  Theme  III  integrates  various  aspects  of  artificial 
neural  network  research  into  one  program.  Variable  weight  devices,  especially,  EEPROMs,  arc 
being  developed  to  implement  a  novel  device  concept  and  ANN  learning  algorithms.  Well 
developed  logic  synthesis  tools  are  being  used  to  perform  the  learning  task  inherent  to  the 
ANN  concept.  Layered  ANN  architecture  and  fault-tolerant  ANN  are  being  investigated.  The 
critical  issues  of  parallel  computing  are  studied.  Finally,  ANN  is  applied  to  signal  processing. 

Several  significant  accomplishments  arc  highlighted  in  Part  B  of  this  report.  A  single 
interface  trap  generated  by  hot  carriers  has  been  observed  and  characterized.  Transistor  current 
change  due  to  the  filling  and  emptying  of  a  single  trap  was  observed  in  very  small  MOSFETs. 


L 


A  new  ferroelectric  memory  cell  overcoming  the  read-cycle  limitation  is  described.  An 
EEPROM  based  parallel  nearest-neighbor  algorithm  has  the  potential  for  achieving  100X 
increase  in  computation  time  without  the  need  for  learning.  The  fault  tolerance  of  ANN  is 
shown  by  example  to  be  less  than  excellent;  and  a  method  for  improving  the  fault  tolerance  is 
developed. 

The  enclosed  Annual  Report  Appendix  includes  copies  of  17  published  articles,  9  confer¬ 
ence  papers,  5  papers  submitted  for  publication,  and  abstracts  of  7  Master’s/Ph.D.  theses. 


PART  B  -  SIGNIFICANT  ACCOMPLISHMENTS 


Realization  of  a  Reflector  with  a  Window  for  Optical  Pumping  of  a 
Surface  Emitting  Laser  (SELD) 

Professor  S.  Wang  with  Mark  Hadley 

A  surface  emitting  laser  offers  many  distinct  advantages  over  an  edge  emitting  laser. 
These  include  the  possibility  of  forming  a  two-dimensional  array  and  the  potential  of  turning 
on  by  an  optical  beam  with  a  speed  much  faster  than  achievable  by  a  current  pulse.  However, 
an  optically  pumped  SELD  requires  a  specially  designed  reflector  which  provides  a  high 
reflectivity  at  the  laser  wavelength  X,  and  a  low  reflectivity  at  the  pump  wavelength  Xp  shorter 
than  X,.  The  accompanying  figure  shows  the  measured  reflectance  of  such  a  reflector  grown  in 
our  laboratory  by  MBE.  The  window  of  low  reflectivity  from  9,100  to  9,700  angstroms  is 
intended  for  the  pump  beam  to  be  coupled  into  the  SELD  cavity  while  the  region  of  high 
reflectivity  from  9,900  to  10,300  angstroms  is  intended  to  provide  a  high-Q  cavity  for  the  laser 
beam. 

The  experimentally  measured  reflectance  curve  is  still  far  from  being  ideal,  but  not 
because  of  faulty  design.  As  a  matter  of  fact,  the  theoretically  calculated  reflectance  curve 
shows  a  broad  maximum  of  near-unity  reflectivity  and  a  broad  window  of  near-zero  rcflectivi  y 
(or  near-perfect  transmission).  However,  because  the  MBE  growth  is  unexpectedly  interrupted, 
the  fabricated  structure  deviates  from  the  designed  structure.  With  improved  computer  control 
of  the  MBE  growth  process,  we  expect  to  have  a  reflector  of  the  desired  reflectance  charac¬ 
teristic.  In  any  case,  the  experimental  result,  though  it  is  our  first  attempt,  docs  confirm  the 
validity  of  the  design  concept  and  docs  represent  the  first  demonstration  of  its  implementation. 
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POWER  REFLECTIVITY 


Characterizing  a  Single  Hot-Electron-Induced  Trap  in  Submicron  MOSFET 
Using  Random  Telegraph  Noise 

Professors  C.  Hu  and  P.K.  Ko  with  P.  Fang 

Individual  interface  traps  generated  by  hot  electron  stress  are  observed  for  the  first  time 
[1].  Single  trap  filling  and  emptying  can  cause  0.1%  step  noise  in  Id  due  to  columbic  scattering 
in  a  vcry-small-size  MOSFET.  Trap  location  (3-10  angstroms  from  interface),  time  constant, 
energy  and  escape  frequency  are  found  to  be  very  different  from  process-induced  traps. 

The  deep-submicron  devices  used  in  this  study  were  fabricated  using  a  photoresist-ashing 
technique  [2].  The  oxide  thickness  is  8.6  nm  and  substrate  doping  density  is  5x10  cm  . 

Fig.  1  shows  the  current  noise  after  hot  electron  stress  at  Vg  =  2V,  Vd  =  4.5V,  Isub  =  10 
pA  for  10  minutes.  It  shows  the  current  fluctuation  in  a  deep  submicron  n-MOSFET  with 
Weff  =  0.5  pm,  =  0.35  pm.  The  striking  two-level  current  fluctuation  is  due  to  the  filling 
and  emptying  of  a  single  interface  trap.  It  is  known  as  the  Random  Telegraph  Noise  (RTS) 
and  is  observed  only  when  the  channel  area  is  small  enough  to  contain  only  one  trap  within 
kT’s  from  the  fermi  level.  This  is  the  first  observation  of  a  single  hot-electron  generated  inter¬ 
face  trap. 

RTS  noise  can  be  a  useful  tool  for  studying  stress-induced  interface  traps.  It  is  easier  to 
observe  stress-induced  traps  than  process-induced  traps  due  to  the  small  stress  area  and  low 
stress-induced  trap  density  after  light  stressing.  Using  RTS  as  a  characterization  tool,  we 
found  the  stress-induced  trap  to  be  located  closer  to  the  interface,  and  therefore  have  a  shorter 
time  constant  and  much  stronger  influence  on  scattering  and  AId  than  process-induced  traps. 
Table  I  lists  the  time-constants  of  hot  electron-induced  interface  trap  and  process-induced,  i.e. 
pre-stress,  interface  traps.  The  former  arc  about  50  times  shorter  than  the  later. 
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Ferroelectric  Memory  Cell  with  Unlimited  Read/Write  Cycles 
Professor  C.  Hu  with  R.  Moazzami 

Single-transistor  ferroelectric  random  access  memory  (FRAM)  has  received  much  atten¬ 
tion  lately.  Ramtron  and  National  Semiconductor  have  introduced  commercial  products.  For 
military  applications,  its  radiation  hardness  and  endurance  (101®  write  cycles),  superior  to 
those  of  floating-gate  nonvolatile  memories,  are  particularly  attractive. 

Unfortunately,  FRAM  makes  no  distinction  between  "write"  and  "read,"  i.e.  the  read 
cycles  are  also  limited  to  lO1^  read  cycles.  10^  read  cycles  can  be  consumed  in  30  seconds 
if  a  cell  is  read  continuously  at  33  MHz.  DARPA  has  a  large  program  on  FRAM  with  one 
major  goal  being  the  improvement  of  the  read  cycles. 

We  have  invented  a  new  ferroelectric  memory  cell  that  overcomes  this  shortcoming.  The 
cell  is  read  as  a  DRAM  without  switching  the  polarity  of  the  ferroelectric  polarization.  During 
"read,"  the  ferroelectric  film  simply  serves  as  a  high-permitivity  dielectric  in  the  DRAM  capa¬ 
citor  -  enabling  a  smaller  cell  size  than  the  state-of-the-art  16  Mb  DRAM  cell.  In  the  write 
mode,  the  memory  operates  as  a  FRAM.  However,  nonvolatile  "write"  is  performed  only 
when  the  power  supply  is  lost.  This  scheme  is  similar  to  that  of  NVRAM  (nonvolatile  RAM). 
However,  a  NVRAM  cell  is  twice  as  large  as  an  SRAM  cell  and  its  writing  cycle  is  limited  to 
about  10,000,  while  the  new  ferroelectric  memory  has  the  size  of  a  DRAM  and  a  write  cycle 
of  1010. 

An  invention  disclosure  has  been  filed.  The  ferroelectric  film  characteristics  necessary  for 
the  cell  operation  have  been  studied  and  reported.  Effort  is  being  made  to  fabricate  a  working 
device. 
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O  Entire  cell  is  denser  than  present  Dl^AiVls  because  ferroelectric  film 
is  equivalent  to  10A  Si02  in  the  capacitor. 


Learning  Through  Logic  Synthesis 
Professor  Alberto  Sangiovanni-Vincentelli  with  Arlindo  Oliveira 

We  have  shown  that  logic  synthesis  techniques  can  be  used  to  derive  networks  of  thres¬ 
hold  gates  that  perform  rule  induction  from  examples.  Using  logic  synthesis  techniques,  we  are 
able  to  derive  networks  that  perform  the  required  mapping,  but  are  of  smaller  size  than  the 
ones  obtained  by  alternative  algorithms.  It  has  been  shown  that  the  quality  of  the  induction 
rules  obtained  is  closely  connected  to  the  complexity  of  the  hypothesis  generated.  Therefore, 
our  ability  to  generate  small  networks  implies  better  generalization  than  the  one  obtained  by 
alternative  techniques. 

We  have  compared  the  quality  of  the  induction  performed  by  our  algorithm  with  two  dis¬ 
tint  approaches:  decision  trees  and  back-propagation.  The  results  obtained  in  a  set  of  represen¬ 
tative  examples  have  shown  that  the  logic  synthesis  approach  outperforms  both  these  methods, 
both  in  the  number  of  errors  and  in  the  number  of  examples  needed  to  generate  an  exact 
representation  of  a  given  concept. 
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Massively  Parallel  Analog  Geometric  Computation  Using  EEPROMs 
A.  Kramer,  P.  K.  Ko  and  Alberto  Sangiovanni-Vincentclli 

A  novel  architecture  for  massively  parallel  analog  computation  is  presented  which  uses  an 
EEPROM  as  the  essential  computational  building  block.  The  proposed  architecture  is  similar 
in  appearance  to  a  memory  array,  in  that  the  chip  stores  a  set  of  d-dimensional  memories,  one 
per  row.  In  contrast  to  standard  digital  memory,  each  row  in  the  proposed  chip  is  capable  of 
storing  in  analog  either  a  point,  a  hypersphere,  or  a  hyperrectangle  in  d-dimcnsional  space.  The 
other  important  difference  between  this  architecture  and  a  standard  memory  chip  is  that  a  sub¬ 
stantial  amount  of  computation  can  be  performed  in  parallel  directly  on  all  the  stored 
memories.  In  particular  the  chip  computes  geometric  relationships  based  on  euclidean  distance 
between  the  stored  memories  and  a  new  d-dimensional  query  point  The  relationships  the  chip 
is  capable  of  computing  include: 

•  euclidean  distance  between  the  query  point  and  all  stored  points 

•  euclidean  distance  between  the  query  point  and  all  stored  hyperrectangles 

•  exclusion  of  the  query  point  in  all  stored  hyperrectangles 

•  exclusion  of  the  query  point  in  all  stored  hyperspheres 

Inclusion  of  some  control  circuitry  and  a  priority  queue  allow  either  the  k  nearest 
points/rectangles  to  the  query  point  or  all  rectangles/spheres  enclosing  the  query  point  to  be 
read  out  of  the  chip.  In  addition,  the  architecture  is  capable  of  performing  these  computations 
on  any  subset  of  the  d  dimensions,  giving  it  obvious  utility  as  an  analog  associative  memory 
chip.  In  fact,  the  design  is  similar  to  a  recent  digital  content  addressable  memory  (CAM)  chip 
[II-  For  the  CAM  task,  the  proposed  analog  implementation  has  several  advantages  over  this 
digital  design  including  density  (O(103)  times  more  memory/chip],  speed  [OOO2)  times  as  fasti 
and  the  fact  that  it  can  perform  the  associative  memory  function  on  analog,  as  well  as  boolean 
vectors. 
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Both  the  analog  storage  and  the  on-chip  computation  are  performed  by  EEPROMs.  Ear¬ 
lier  work  has  show  the  ability  to  set  the  threshold  voltage  of  an  EEPROM  to  an  analog  value 
with  up  to  8-bits  of  precision  [2].  The  analog  computation  is  performed  by  storing  points  (or 
rectangles)  as  the  threshold  voltages  of  a  row  of  EEPROMs  connected  by  a  common  drain  and 
applying  the  query  point  as  analog  voltages  on  the  gates  of  the  same  devices.  By  making  use 
of  the  inherent  I-V  characteristics  of  an  EEPROM  in  saturation  {Id^(Vg-v,)2)  and  taking 
advantage  of  current  summing,  the  result  of  this  circuit  is  a  current  into  each  row  proportional 
to  euclidean  distance  squared  between  the  query  point  and  the  point  stored  on  that  row.  The 
speed  [O(ps)  settling  time]  and  density  (O(106)  elements  per  chip,  i.e.  106/d  d-dimensional  vec¬ 
tors]  of  the  proposed  architecture  promise  to  make  it  a  powerful  engine  for  real-world  compu¬ 
tational  tasks  such  as  associative  memory  and  pattern  classification. 
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Fault  Tolerance  in  Feed-Forward  Artificial  Neural  Networks 


Reed  D.  Clay  (Professor  C.  H.  Sdquin) 

The  errors  resulting  from  defective  units  and  faulty  weights  in  layered  feed-forward 
ANN’s  are  analyzed,  and  techniques  to  make  these  networks  more  robust  against  such  failures 
have  been  explored.  First,  using  some  simple  examples  of  pattern  classification  tasks  and  of 
analog  function  approximation,  we  have  demonstrated  that  standard  architectures  subjected  to 
normal  backpropagation  training  techniques  do  not  lead  to  any  noteworthy  fault  tolerance. 
Additional  redundant  hardware  coupled  with  suitable  new  training  techniques  is  necessary  to 
achieve  that  goal. 

A  simple  and  general  procedure  has  been  found  that  develops  fault  tolerance  in  neural 
networks:  The  type  of  failures  that  one  might  expect  to  occur  during  operation  arc  introduced 
at  random  during  the  training  of  the  network,  and  the  resulting  output  errors  are  used  in  a  stan¬ 
dard  way  for  backpropagation  and  weight  adjustment  The  result  of  this  training  method  is  a 
modified  internal  representation  that  is  not  only  more  robust  to  the  type  of  failures  encountered 
in  training,  but  which  is  also  more  tolerant  of  faults  for  which  the  network  has  not  been  expli¬ 
citly  trained. 

Ongoing  work  concerns  a  more  detailed  investigation  of  how  the  effect  of  failing  hidden 
units  can  be  mitigated  in  analog  function  approximation  tasks.  We  have  discovered  a  promis¬ 
ing  approach  which  achieves  this  goal  by  tightly  controlling  the  fractional  contribution  that 
each  hidden  unit  makes  to  the  linearly  summed  output  value. 

Publications 

[1]  C.  H.  Slquin  and  R.  D.  Clay,  "Fault  Tolerance  in  Artificial  Neural  Networks",  to  appear 
in  Neural  Networks,  Concepts,  Applications  and  Implementations,''  Vol.  4,  Antognctti  and 
Milutinovic,  eds.,  Prentice  Hall,  1991.  Available  as  Technical  Report  No.  90-031  from 
International  Computer  Science  Institute,  Berkeley. 
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PART  C  -  INDIVIDUAL  WORK  UNITS 


I-A.  Nonlinear  Optics  in  Compound  Semiconductors 
Professor  S.  Wang  with  Patrick  Harshman 

Our  work  on  nonlinear  optics  in  compound  semiconductors  during  the  past  year  has  been 
focused  on  two  topics:  an  investigation  of  (111)  strained  layer  structures  and  further  work  on 
the  surface-emitting  second-harmonic  generation  scheme  which  was  developed  under  JSEP 
support  and  reported  on  last  year. 

Strained  layers  grown  in  the  (111)  direction  have  been  predicted  to  possess  large  built-in 
electric  fields  and  our  goal  is  to  exploit  these  built-in  fields  to  achieve  strongly  enhanced 
sccond-and  third-order  nonlinear  optical  effects.  We  have  made  significant  progress  on  the 
problem  of  the  growth  of  (111)  strained  layers,  and  have  demonstrated  (111)  strained  layers 
which  are  of  high  optical  quality  [1].  More  recently,  we  have  studied  the  low  temperature 
photoluminescence  characteristics  of  these  structures  and  have  found  evidence  which  suggests 
the  attainment  of  self-biased  strained  quantum  wells  [2].  This  work  constitutes  the  first  direct 
optical  evidence  of  the  existence  of  these  built-in  electric  fields. 

Figure  1  shows  the  5K  photoluminescence  spectrum  from  an  AlAs/Al^  ^In^  ^As  multi¬ 
quantum  well  structure  grown  on  a  2°  tilted  semi-insulating  GaAs  substrate.  We  attribute  the 
peak  at  8340  angstroms  (1.487  eV)  to  a  Cx-hhx  excitonic  transition  and  the  peak  at  8200 
angstroms  (1.511  eV)  to  a  Cx-hh2  excitonic  transition.  In  the  presence  of  strain  the  valence 
band  energy  surfaces  are  described  by  (1) 

Ek  =  Ak2+At  ±  vq*+5t*+4e  H) 

where 

%k  =B1k1+C2  [*,2*/+*yV+*,V] 

5c*  =  Bb  [3  (*,2E«+Ay2Ew+*I2efX  ]-*2e  j+2£W  |ifcx kyZ^  +ky k, Zyt  +kt kx e„  j 

b2  r  r  v  r  v  r  v 


Here  a,  b,  and  d  are  the  material-dependent  deformation  potentials  and  the  e^’s  are  the  strain 
tensor  elements.  For  biaxially  strained  (111)  layers  and  ev=ey,=eu  because  of  the 

three-fold  symmetry  of  the  (1 1 1)  plane.  Therefore,  the  change  in  the  energy  gap  at  zone  center 
(k  =  0)  is 

A Egr  =  3 aea  ±  'l'5dEXf  (2) 

The  strain  tensor  elements  for  a  strained  layer  is  related  to  the  lattice  mismatch  through  the 

material  elastic  tensor  elements  C(/  by  (2) 

Ezy  =  -  \artf  laiay^-X  j  [c„+2Ci2]/  |4C44+2C12+C11  j  (3) 

e„  =  |(Cii+2C|2j 

The  value  of  AE[  can  be  calculated  using  the  material  parameters  compiled  by  Adachi  [3]. 
The  assignment  of  the  PL  peak  at  8340  angstroms  to  a  Cx-hhx  transition  assumes  a  generally 
used  value  of  10  meV  for  the  exciton  binding  energy.  Theoretical  investigation  is  under  way 
to  calculate  the  exciton  binding  energy  in  a  quantum  well,  taking  into  account  the  anisotropy  of 
effective  masses. 

Figure  2  shows  variation  of  PL  spectra  with  excitation  intensity.  Two  features  of  the 
Crhhl  transition  are  to  be  noted.  First  there  is  a  blue  shift  at  higher  excitation  intensity  /„. 
This  can  be  attributed  to  a  screening  of  the  built-in  electric  field  by  the  optically  generated 
electron-hole  pairs.  Therefore,  the  blue  shift  can  be  used  as  evidence  of  the  existence  of  a 
built-in  electric  field.  Work  is  in  progress  to  calculate  the  wavelength  shift  caused  by  the  stark 
effect  [4].  Second,  the  PL  spectrum  shows  saturation  at  higher  /0.  This  happens  when  the  car¬ 
rier  concentration  becomes  sufficiently  high  to  screen  the  coulomb  interaction  responsible  for 
the  formation  of  exciton.  The  Cx-hh2  transition,  on  the  other  hand,  docs  not  show  any  satura¬ 
tion  at  /0  =  \45mW .  The  reason  for  non-saturation  is  being  investigated. 

We  are  also  presently  engaged  in  the  design,  growth,  and  experimental  evaluation  of  an 
improved  performance  surface-emitting  second-harmonic  generator.  Our  approach  is  to  use 
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asymmetric  quantum  wells  with  near-resonant  intersubband  transition  energies  to  achieve  a 
second-order  susceptibility  which  is  much  larger  than  that  of  the  previously  used  bulk  GaAs. 
The  quantum  well  scheme  has  the  additional  advantage  of  offering  the  potential  for  phase¬ 
matching  in  the  propagation  direction  of  the  surface-emitted  second-harmonic  signal. 
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I-B.  Ultrafast  Optical  Techniques 

Professor  John  Stephen  Smith  with  Hong  Lin,  Jeff  Walker,  Sol  Dijaili,  Gordon  Wilson 

and  James  Yeh 

We  have  now  demonstrated  low  threshold,  moderately  high  power  surface  emitting  lasers 
using  our  Phase  locked  Molecular  Beam  Epitaxy  technique.  We  are  currently  emphasizing 
work  on  higher  power  versions  of  this  laser  structure.  In  addition  we  plan  to  use  multiquan¬ 
tum  well  saturable  absorbers  embedded  in  an  MBE  grown  Bragg  layer  distributive  reflector  to 
passively  mode  lock  the  Ti:Sapphirc  laser.  A  version  of  this  technique  was  first  demonstrated 
by  Keller,  et  at.  A  wavelength  tunable  short  pulse  system  will  allow  us  to  characterize  the 
saturable  absorber/mirror. 

We  have  recently  demonstrated  the  usefulness  of  cross-phase  modulation  in  a  semicon¬ 
ductor  amplifier  to  modulate  the  properties  of  short  pulses.  In  particular,  we  have  demon¬ 
strated  the  removal  of  an  adiabatic  chirp  from  ultra  short  pulses  using  cross-phase  modulation. 
A  simple  expression  for  the  chirp  imparted  on  a  weak  signal  pulse  by  the  action  of  a  strong 
pump  pulse  has  been  derived.  A  novel  dispersive  technique  for  characterizing  the  resulting 
nonlinear  chirp  was  introduced  and  used  in  the  experiment.  A  maximum  frequency  excursion 
of  16  GHz  due  to  the  cross-phase  modulation  was  measured.  A  value  of  6  was  found  for  avm 
which  is  a  factor  for  characterizing  the  cross-phase  modulation  in  a  similar  manner  to  the  con¬ 
ventional  linewidth  enhancement  factor,  a. 

We  are  investigating  the  use  of  frequency  up-conversion  to  extract  and  demultiplex  high 
bandwidth  and  time  multiplexed  signals  from  optical  fibers.  The  technique  uses  a  synchronized 
local  optical  pulse  source  which  is  optically  mixed  with  the  signal,  and  the  sum  frequency  is 
detected  with  a  PIN  device  or  other  detector.  This  would  allow  one  channel  of  N  to  be 
decode'  where  N  is  the  ratio  of  the  speed  of  the  local  optical  pulse  to  the  speed  of  the  detec¬ 
tor  at  Ui.  ,,  m  frequency.  The  sum  frequency  conversion  efficiency  can  be  high,  with  sufficient 


-  21 


local  pulse  power.  We  have  constructed  a  demonstration  two  channel  system.  The  time  reso¬ 
lution  of  this  technique  and  the  noise  performance  has  been  measured.  An  interesting  exten¬ 
sion  of  this  work  will  be  the  use  of  a  partially  phase  matched  technique  in  which  the  nonlinear 
interaction  takes  place  in  a  waveguide,  and  the  sum  frequency  is  generated  as  a  wave  propagat¬ 
ing  out  from  the  waveguide  at  an  angle.  The  dispersion  of  the  waveguide  will  then  sweep  the 
local  pulse  temporally  through  the  signal,  converting  the  time  domain  into  spatially  separated, 
low  frequency  signals.  Thus,  with  a  single  local  pulse,  the  waveguide  device,  and  an  array  of 
simple  detectors,  many  channels  can  be  decoded  at  once. 
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Optical  Parametric  Interaction"  presented  at  the  Conference  on  Lasers  and  Electro-Optics, 
Anaheim,  California,  May  1990. 

[4]  S.P.  Dijaili,  J.M.  Wiesenfeld,  G.  Raybon,  C.A.  Burrus,  A.  Dienes,  J.S.  Smith,  J.R.  Whin- 
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I-C.  Optical  Probing  of  Semiconductor  Devices  and  Interfaces  by  Electro-Optic 
and  Photo-Elastic  Effects 

Professor  S.  Wang  with  Mark  Hadley 

One  important  development  in  GaAs  technology  is  the  discovery  of  a  low-temperature 
GaAs  buffer  layer  [1]  which  has  eliminated  backgating  effect.  Even  though  the  LTBL  has  a 
very  low  level  of  response  to  excitation  by  visible  light  due  to  extremely  short  carrier  lifetime, 
it  can  respond  to  below-gap  photo-excitation  by  lifting  electrons  from  deep-level  traps.  As  a 
matter  of  fact,  such  a  study  can  lead  to  elucidation  of  the  physical  mechanism  responsible  for 
the  semi-insulating  property.  We  plan  to  use  tunable  Ti-sapphire  laser  as  the  excitation  source 
to  study  the  dynamics  of  carrier  de-trapping  and  trapping.  As  a  preparatory  step,  we  have 
started  to  grow  LT  buffer  layers  under  different  growth  conditions.  Figure  1  shows  the  quality 
of  the  grown  layers  as  functions  of  layer  thickness  and  substrate  temperature  at  a  constant 
As/Ga  flux  ratio.  The  solid  line  indicates  demarkation  between  films  of  good  and  bad  mor¬ 
phology.  We  plan  to  take  and  examine  TEM  micrographs  to  determine  the  sizes  and  density 
of  As  precipitates  [2,3]. 

While  we  are  waiting  for  LT  GaAs  films,  we  have  come  up  with  a  novel  and  potentially 
important  idea  of  making  an  optical  pumping  surface  emitting  laser.  Periodic-layered  struc¬ 
tures  commonly  used  for  surface  emitting  laser  diode  (SELD)  exhibit  a  reflectance  curve  con¬ 
sisting  of  a  main  stop  band  and  symmetric  diffraction  side-lobes.  This  type  of  reflectance 
characteristic  limits  the  percentage  of  optical  power  which  can  be  coupled  into  the  laser  cavity. 
For  optical  pumped  SELD,  we  need  a  window  in  the  reflectance  curve  through  which  a  large 
percentage  of  pumping  power  can  be  coupled  into  the  active  region  of  the  laser  cavity. 

We  have  looked  at  ways  to  increase  the  efficiency  of  an  optically  pumped  SELD. 
Theoretically,  the  maximum  amount  of  power  absorbed  from  a  pump  beam  is  given  by 

^bsorbtd'  _  . 

P^uUn,  =  "  [1  -  Re_2n'  ] 
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where  e ~<u  is  the  loss  of  the  pump  beam  in  the  gain  region  and  R  is  the  reflectivity  of  the 
lower  mirror.  Some  typical  numbers  are  e-01'  =  0.9  and  R  =  0.98.  Using  these  numbers  in  Eq. 
(1)  shows  that  91%  of  the  pump  beam  can  be  absorbed.  This  is  much  higher  than  reported  in 
any  publication.  In  order  to  achieve  this  maximum  absorption  it  is  necessary  to  modify  the 
upper  mirror  structure  to  match  the  power  into  the  gain  region. 

For  the  specific  case  of  large  loss,  the  problem  reduces  to  making  the  top  mirror  tran¬ 
sparent  at  the  pump  wavelength.  A  computer  program  was  designed  to  make  a  "pump  win¬ 
dow"  in  the  mirror  without  reducing  the  peak  reflectivity.  A  theoretical  plot  of  such  a  mirror 
is  shown  in  Figure  2.  Such  a  mirror  was  grown  using  MBE  with  good  results.  The  experi¬ 
mental  reflectance  curve  for  the  MBE-grown  is  shown  in  Figure  3.  Note  the  appearance  of  a 
broad  window  of  low  reflectance  in  the  9,100  to  9,700  angstroms  region.  Due  to  an  unex¬ 
pected  interruption  and  change  in  condition  in  MBE  growth,  the  experimental  curve  deviates 
somewhat  from  the  theoretical  curve.  However,  the  result  does  confirm  the  validity  of  the  con¬ 
cept  and  represents  the  first  demonstration  of  the  concept. 
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II-A.  0.1  pm  BiCMOS  Devices  in  Bulk  and  SOI  Substrates 

Professors  C.  Hu  and  P.K.  Ko  with  J.  Chung,  P.  Nee  and  F.  Assaderaghi 

During  this  period  we  have  designed  and  generated  two  new  mask  sets  for  0.1  pm 
CMOS  and  lateral  bipolar  transistors.  One  set  was  designed  with  SOI  (silicon  on  insulator) 
devices  in  mind.  Development  of  the  lithography  and  etching  techniques  necessary  for  fabri¬ 
cating  0.1  pm  devices  has  begun.  In  the  meantime  we  have  characterized  submicron  MOS- 
FETs  down  to  0.2  pm  channel  length,  and  begun  to  investigate  the  SOI  devices.  Eight  papers 
resulting  from  this  work  unit  have  been  published  in  this  period. 

The  two  test  masks  were  designed  to  test  the  feasibility  of  producing  0.1  pm  channel 
length  or  lateral  base  width  using  a  novel  lithography  technique  previously  developed  with 
JSEP  support.  An  I-line  stepper  is  used  to  produce  0.6  pm  photoresist  lines.  The  developed 
photoresist  pattern  is  isotropically  etched  in  oxygen  plasma  in  a  manner  often  called  photoresist 
ashing  until  the  desired  linewidth  is  achieved.  This  technique  has  been  used  successfully  by  us 
to  fabricate  0.25  pm  devices  under  JSEP  sponsorship.  Those  devices  set  the  world  record  for 
room-temperature  silicon  device  speed  at  22ps.  We  have  just  completed  one  test  run  aimed  at 
producing  0. 1  pm  devices.  The  electrical  results  are  unclear  because  of  an  ohmic  contact  prob¬ 
lem.  The  samples  are  currently  being  studied  using  high  resolution  SEM. 

One  of  the  mask  sets  has  SOI  devices  in  mind.  Both  N-channel  and  P-channcl  MOS- 
FETs  are  included.  A  unique  feature  is  that  several  novel  structures,  having  the  MOSFET 
body,  either  shorted  to  the  source  or  separately  contacted.  The  SOI  MOSFET  body  is  usually 
floating.  The  floating  body  is  suspected  of  causing  low  breakdown  voltage  and  of  preventing 
measurement  of  the  substrate  current  —  a  powerful  tool  for  studying  the  high  field  effects  in 
deep  submicron  MOSFETs.  These  structures  will  eliminate  these  problems.  Many  lateral 
bipolar  transistor  structures  are  also  designed.  We  believe  that  complimentary  (NPN  and  PNP) 
lateral  bipolar  devices  will  be  very  attractive  on  SOI  substrates. 
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We  have  found  a  noise  peak  in  SOI  MOSFETs  and  used  the  noise  to  study  the  SOI 
Si/Si02  interfaces  [1].  One  important  conclusion  is  that  the  bottom  interface  in  SIMOX  device 
has  no  higher  interface  trap  density  than  state-of-art  gate  oxides. 

For  the  first  time  a  single  hot-carrier  generated  interface  trap  has  been  observed  and 
characterized  [2].  In  a  sufficiently  small  MOSFET,  e.g.,  0.2  pm  x  0.5  pm,  a  single  interface 
trap  can  cause  clear  step-function  changes  in  the  source-drain  current  as  the  trap  is  filled  and 
emptied.  The  height  and  frequency  of  the  step  function  as  well  as  its  gate  voltage  dependence 
allowed  us  to  thoroughly  characterize  a  single  trap. 

The  effect  of  hot-electron  degradation  of  submicron  devices  on  analog  MOSFET  perfor¬ 
mance  was  studied  [3],  Although  the  effect  of  device  degradation  on  digital  circuit  has  been 
widely  studied,  ours  is  the  first  study  of  its  impact  on  analog  circuit  reliability.  We  concluded 
that  many  analog  circuits  are  much  more  susceptible  to  hot  electron  induced  degradation  than 
the  typical  digital  circuits.  N  and  P  MOSFET  degradation  by  hot  carriers  was  also  studied 
[4,5]  and  a  gate  current  model  was  presented  for  PMOSFETs  for  the  first  time.  We  have  also 
concluded  a  study  of  the  effects  of  source/drain  series  resistance  on  deep  submicron  devices 
[6].  Reduction  in  the  measured  saturation  drain  current  relative  to  the  ideal  saturation  current 
(/?s d  =  0.0  Q  pm)  is  about  4%  for  =  0.7  pm  and  TQX  =  15.6  nm,  and  10%  for  =  0.3 
pm  and  TQx  =  8.6  nm.  Reduction  of  current  in  the  linear  regime  and  reduction  of  the  simu¬ 
lated  ring  oscillator  speed  are  both  about  3  times  higher.  Silicidization  of  the  source/drain  is 
estimated  to  eliminate  as  much  as  50%  of  the  performance  degradation. 

Finally,  two  studies  of  submicron  GaAs  MESFETs  initiated  last  year  have  been  com¬ 
pleted.  It  was  found  that  backgating  in  GaAs  MESFETs  at  high  drain  voltages  can  be 
significantly  reduced  by  properly  adjusting  the  EL-2  center  concentration.  The  reduction  is  due 
to  the  compensation  of  the  negative  space  charge  at  the  channel  substrate  interface  by  holes 
generated  by  impact  ionization  in  the  MESFET  channel  [7].  A  simple  model  is  presented  for 
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the  negative  drain  current  transients  observed  in  GaAs  MESFETs  when  subjected  to  ionizing 
radiation  [8].  The  two  dominant  mechanisms  are  proposed  to  be  electron  trapping  under  the 
Schottky  gate  and  in  the  neutral  semi-insulating  substrate.  The  model  is  suitable  for  the  design 
and  evaluation  of  radiation-resistant  GaAs  MESFET  integrated  circuits  using  common  electrical 
simulators  such  as  SPICE3. 
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II*B.  Conductive  Oxides  and  Ferroelectrics  for  Programmable  Devices 
Professor  C.  Hu  with  Reza  Moazzami  and  H.  Shin 

The  large  charge  storage  density  requirement  for  future  generations  of  DRAMs  has  gen¬ 
erated  significant  interest  in  high  dielectric  constant  materials  such  as  tantalum  pentoxide  and 
yttrium  oxide.  However,  because  of  the  lower  dielectric  breakdown  strengths  of  these  materi¬ 
als,  the  net  gain  in  charge  storage  density  has  been  a  factor  of  two  or  three  at  best.  Recently, 
nonvolatile  memory  cells  exploiting  the  large  polarization  and  ferroelectric  hysteresis  loops  of 
materials  such  as  lead  zirconate  titanate  (PbZr^Ti^O^,  commonly  called  PZT)  have  been  pro¬ 
posed.  However,  because  these  memories  suffer  from  fatigue,  a  gradual  loss  of  polarization 
following  repeated  read/write  cycling,  ferroelectric  materials  have  also  been  considered  as  a 
direct  replacement  for  oxide/nitride/oxide  structures  in  conventional  volatile  DRAMS.  In  this 
case,  the  ferroelectric  is  not  cycled  between  the  two  polarization  states  during  read/write  opera¬ 
tion  thus  possible  avoiding  significant  fatigue.  We  have  proposed  a  ferroelectric  nonvolatile 
RAM  (FNVRAM)  which  normally  operates  as  a  conventional  DRAM  yet  also  exploits  the  hys¬ 
teresis  loop  of  ferroelectric  materials  for  nonvolatile  operation  [1,2].  The  relevant  properties  of 
the  PZT  films  were  studied  [1,2,3]. 

The  FNVRAM  cell  is  a  simple  one-transistor  DRAM  cell  with  a  ferroelectric  capacitor  as 
shown  in  Fig.  1.  A  conductive  diffusion  barrier  is  required  as  the  storage  node  contact  to 
prevent  interdiffusion  of  silicon  with  the  ferroelectric  material  during  high-temperaturc  anneal¬ 
ing.  Two  different  bias  schemes  for  the  operation  of  the  FNVRAM  cell  were  described.  In 
one  scheme,  the  cell  plate  is  always  held  at  half  of  the  supply  voltage  During 

DRAM  operation,  the  storage  node  is  at  either  VDDf2  or  VDD  such  that  the  ferroelectric  capa¬ 
citor  is  not  cycled  between  opposite  polarization  states.  Upon  command  or  power  failure,  a 
nonvolatile  store  operation  is  executed:  the  state  of  the  cell  is  read  and  written  back  as  one  of 
the  two  permanent  polarization  states  of  the  ferroelectric  film.  If  the  DRAM  datum  is  zero, 
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i.e.,  the  storage  node  is  at  V^^/2,  the  word  line  is  selected  and  the  bit  line  is  grounded.  The 
ferroelectric  is  now  polarized  in  one  direction  (nonvolatile  zero).  The  recall  operation  is  per¬ 
formed  similarly  to  a  DRAM  read:  the  remanent  polarization  of  the  ferroelectric  film  is  sensed 
and  restored  as  one  of  the  two  DRAM  states. 

In  this  manner,  the  ferroelectric  film  is  only  cycled  between  opposite  polarization  states 
during  nonvolatile  storc/recall  operations,  not  during  DRAM  read/write  operations.  Even  after 
1010  store/recall  cycles  (corresponding  to  10  store/recall  cycles  per  second  for  30  years),  there 
is  sufficient  detectable  ferroelectric  polarization.  Therefore,  fatigue  from  store/recall  cycling  is 
not  a  serious  limitation  to  the  nonvolatile  operation  of  this  cell.  Since  the  ferroelectric  polari¬ 
zation  is  not  reversed  during  DRAM  read/write  operation,  there  is  almost  no  loss  in  nonvolatile 

polarization  even  after  lO1^  read/write  cycles.  At  this  rate,  the  FNVRAM  cell  is  expected  to 

12 

tolerate  orders  of  magnitude  higher  nonswitching  read/write  cycles  than  the  10  switching 
cycles  demonstrated  for  PZT  films.  A  loss  in  the  detectable  polarization  is  observed  even  dur¬ 
ing  DRAM  operation.  However,  a  lower  limit  for  the  available  polarization  can  be  obtained 
from  the  small-signal  capacitance  of  the  ferroelectric  film.  For  the  unoptimized  4000 -angstrom 
PZT  films  studied  here,  this  lower  limit  is  60  fC/pm  for  a  3-V  power  supply  equivalent  to  a 
17 -angstrom  silicon  dioxide  film.  The  resistivity  and  endurance  properties  of  ferroelectric 
films  can  be  optimized  by  modifying  the  composition  of  the  film.  This  cell  can  be  the  basis  of 
a  very  high-density  NVRAM  with  practically  no  read/write  cycle  limit  and  at  least  1010  non¬ 
volatile  store/recall  cycles. 

We  have  completed  a  study  of  the  enhanced  conductivity  of  oxides  grown  on  heavily 
doped  substrates.  It  was  found  that  the  enhanced  conductivity  can  be  attributed  to  the  thinning 
of  oxide  at  the  field  edge.  This  is  a  rather  surprising  finding  which  makes  it  doubtful  that  such 
oxides  will  be  useful  for  nonvolatile  memory  applications.  On  the  other  hand,  it  has  shown  a 
way  of  avoiding  such  enhanced  conduction  in  such  cases  as  oxides  grown  on  heavily  doped 
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substrates  for  switchcd-capacitor  circuits  and  radiation-hard  held  isolation.  A  publication  is 
being  prepared. 
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Fig.  1.  The  FNVRAM  ceil  is  a  conventional  DRAM  ceil  with  a  ferroelectric  capacitor  dielec¬ 
tric.  Since  the  ferroelectric  material  has  a  very  large  polarization,  it  is  possible  to  incorporate 
the  capacitor  in  the  contact  hole  of  the  select  transistor. 
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II-C.  Insulated  Gate  GaAs  Field  Effect  Transistors 


Professors  N.Cheung,  S.Wang,  C.Kh  and  W.Oldham  with  James  Chan 

The  usage  of  aluminum  nitride  (A1NX)  as  a  possible  dielectric  material  for  GaAs 
insulated-gate  field-effect  transistors  (IGFET),  or  metal-insulator-semiconductor  field-effect 
transistors  (MISFET),  is  being  investigated.  We  first  study  the  interfacial  reaction  of  sputtered 
deposited  AIN  and  III-V  semiconductors.  The  next  step  is  to  evaluate  the  electrical  properties 
of  Mctal/AlN/AlGaAs/GaAs  device  structures. 

A1NX  samples  have  been  prepared  under  various  sputtering  conditions,  and  numerous 
composition  and  electrical  tests  have  been  performed,  indicating  that  A1NX  is  a  good  insulator 
material. 

We  used  an  RF  reactive  sputterer  supported  by  an  A1  target  and  a  mixture  of  Ar/N2  gases 
to  prepare  the  A1NX  samples.  Different  sputtering  conditions  with  varying  plasma  power  and 
Ar/N2  gas  mixtures  have  been  utilized  and  compared.  Rutherford  Backscattering  composition 
tests  were  employed  to  analyze  the  composition  of  the  dielectric  films.  With  an  Al/N  ratio  of 
1:1  (A1NX  ,  with  x=1.0)  as  the  ideal  goal,  results  show  that  a  115:40  Ar/N2  mixture  at  a  plasma 
power  of  300  watts  gives  a  maximum  nitrogen  content  of  x=0.8,  with  minimal  oxygen  concen¬ 
tration  (See  Figure  1). 

The  refractive  index  of  the  samples  have  also  been  extracted.  Although  index  values 
were  consistent  within  each  run,  they  varied  from  run  to  run.  Values  ranged  from  1.982  to 
2.265.  The  average  index  for  A1NX  samples  prepared  under  the  115:40  Ar/N2  gas  mixture  is 
2.064.  This  compares  favorably  with  data  reported  in  the  literature  (nf  =  2.152).  ( 1 1 

Current-voltage  tests  measured  leakage  currents  on  the  order  of  100pA/cm:  for  A1NX  film 
sputtered  on  p-type  Si  substrate  at  the  115:40  Ar/N2  gas  ratio.  A  breakdown  field  on  the  order 
of  106  V/cm  has  been  observed  (See  Figure  2),  which  indicate  the  dielectric  nature  of  alumi¬ 
num  nitride. 
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We  are  in  the  process  of  finding  a  correlation  between  dielectric  quality  (i.e.  leakage 
currents,  breakdown  field)  and  refractive  index.  In  addition,  test  diode  structures  made  of  A1NX 
film  on  GaAs  as  well  as  AlAs  substrates  are  being  fabricated.  The  AlGaAs/GaAs  substrates 
were  grown  by  MBE  in  the  MBE  Laboratory  of  UC-Berkeley  (Prof.  J.S.  Smith).  Fabrication 
and  testing  will  commence  early  next  month,  at  which  time  we  will  perform  I-V  and  C-V  char¬ 
acterizations.  These  tests  should  reveal  further  information  concerning  the  interfacial  quality  of 
A1NX  on  these  various  substrates,  and  its  feasibility  as  an  insulator  material  for  GaAs  MIS- 
FETs. 
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Figure  1:  RBS  analysis  of  AIN,  film  sputtered  at  300W,  Ar/N2  ratio  of  1 15  sccm:40  seem  with  x=0.8. 


Figure  2:  Current  Density-Voltage  characteristics  for  ALN,  sputtered  on  p-type  Si  with  conditions 
described  in  Fig.  1.  BV  "  50V,  Em..  *  10*  V/cm. 
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III-A.  Stochastic  Neural  Networks  and  Application  to  Signal  Processing 
Professor  Avidch  Zakhor  with  Sun-Im  Shih 

During  the  last  8  months,  we  have  continued  our  investigation  of  signal  processing  appli¬ 
cations  of  neural  networks.  The  particular  application  we  are  exploring  is  continuous  phase 
modulation  receivers.  Constant  envelope  continuous  phase  modulation  (CPM)  schemes  are 
important  in  peak  power  limited  communication  applications  such  as  satellite  transmission  sys¬ 
tems  and  wireless  communication  terminals.  These  schemes  are  generally  characterized  by 
high  packing  densities  and  prohibitively  complex  receiver  structures.  For  instance,  while  their 
packing  density  increases  with  partial  response  L,  and  alphabet  size  M,  their  optimal  ML 
receivers  require  a  bank  of  matched  filters  whose  size  grows  as  ML .  This  bank  of  filters  is 
followed  by  a  Viterbi  decoder  which  draws  heavily  on  computational  resources.  Specifically, 
while  reducing  the  modulation  index  h  improves  bandwidth  efficiency,  the  number  of  states  in 
the  Viterbi  decoder  increases  with  the  denominator  of  h . 

In  the  past  few  months,  we  have  developed  neural  network  based  receiver  structures  for 
constant  envelope  CPM  systems.  Our  motivation  is  to  reduce  the  complexity  of  implementa¬ 
tion  by  casting  the  demodulauon  task  into  the  more  general  framework  of  a  neural  network 
classification  task.  In  so  doing,  we  replace  the  matched  filter  banks  and  the  Viterbi  decoder  of 
the  optimal  receiver  with  a  feed-forward  net  trained  to  demodulate  the  incoming  baseband  sig¬ 
nal.  Our  approach  is  to  replace  the  entire  receiver  structure,  excluding  timing  recovery,  with  a 
multilayer  feedforward  neural  net  unit  whose  inputs  are  time  samples  of  incoming  baseband 
signals,  and  whose  outputs  are  the  decoded  symbols. 

We  have  simulated  the  neural  net  based  receiver  for  binary  3RC  with  modulation  index 
of  0.8  and  found  its  performance  to  be  within  3.5  dB  of  the  optimum  Viterbi  based  receiver  at 
probability  of  error  of  10~3.  The  architectural  parameters  of  this  network  are  as  follows:  18 
input  nodes,  2  samples  per  symbol  interval  and  observation  length  of  9  symbol  intervals,  30 


hidden  nodes  and  one  output  node.  This  network  was  trained  using  the  well  known  backpropa- 
gation  learning  algorithm  and  noisy  examples.  We  have  examined  the  effects  of  architecture 
parameters  to  decoding  performance  of  our  neural  net  based  receivers.  The  main  conclusion  is 
that  while  increasing  the  number  of  input  nodes  and  hidden  nodes  improves  the  performance  of 
the  receiver,  the  training  complexity  grows  with  the  size  of  the  network.  We  have  also 
developed  analytical  techniques  to  predict  the  signal  to  noise  ratio  performance  of  the  network 
without  having  to  simulate  it.  This  not  only  enables  us  to  understand  the  relationship  between 
network  parameters  and  performance,  but  also  can  be  used  as  a  tool  in  any  classification  appli¬ 
cation  of  multilayer  networks. 

A  number  of  criteria  can  be  used  to  compare  our  architecture  with  conventional  decoders. 
These  include  implementation  cost  in  terms  of  area  and  power,  arithmetic  and  memory  require¬ 
ments,  performance  training  time,  decoding  speed  and  training  program  complexity  in  terms  of 
lines  of  code.  We  shall  first  restrict  ourselves  to  a  simplistic  comparison  of  arithmetic  complex¬ 
ity.  A  network  with  /  input  units,  H  hidden  units  and  O  output  units  requires  Hx(I+0)  multi¬ 
plies  and  additions  per  symbol  interval.  In  a  digital  implementation,  the  Sigmoid  would  be 
evaluated  by  way  of  a  table  lookup,  and  H+O  lookups  would  be  required  per  symbol  interval. 
An  analog  implementation  of  the  nonlinearity,  is  also  conceivable,  although  this  may  degrade 
the  precision  of  the  nonlinearity.  It  is  not  clear  from  this  what  level  of  accuracy  is  required  in 
NN  architectures.  Thus,  our  best  performing  NN  with  36  input  nodes  and  30  hidden  nodes 
requires  about  1110  adds  and  multiplies,  in  addition  to  31  Sigmoid  evaluations  per  demodu¬ 
lated  symbol. 

For  comparison,  consider  conventional  CPM  demodulation.  The  arithmetic  complexity  is 
proportional  to  the  number  of  trellis  states,  pML'1.  One  can  exploit  the  symmetry  of  the  in- 
phase  and  quadrature  components  to  reduce  the  required  number  of  matched  filters  to  2ML ,  or 
16  filters  for  binary  3RC.  Suppose  we  use  filters  with  10  taps  each,  then  160  multiplies  and 
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adds  arc  required.  An  additional  overhead  of  40x2  multiplies,  40x3  adds,  and  80  trigonometric 
evaluations  to  be  implemented  by  table  lookup  over  the  5  possible  phase  states  are  required  to 
form  the  inputs  to  the  Viterbi  processor.  Viterbi  decoders  use  efficient  recursive  algorithms  to 
calculate  the  path  metrics,  however  they  require  large  amounts  of  memory  to  keep  track  of  the 
actual  paths  which  will  be  used  to  ultimately  make  symbol  decisions.  At  each  stage,  an  add 
compare  select  function  for  each  phase  state  is  required,  that  is  40  adds  and  40  compares.  An 
additional,  20  compares  are  required  to  decode  the  symbol.  An  approximate  total  would  give 
320  adds,  220  multiplies,  and  60  compares,  in  addition  to  80  coarse  trigonometric  table  look¬ 
ups  per  stage.  When  compared  to  the  figures  obtained  for  the  NN  classifier,  we  conclude  that 
NN  numerical  complexity  approximately  exceeds  that  of  a  conventional  implementation  by  a 
factor  of  3. 

In  addition,  a  digital  implementation  of  a  NN  decoder  would  require  the  storage  of  1110 
weights  in  ROM  versus  the  160  filter  coefficients  needed  in  a  conventional  receiver.  However 
NN  require  virtually  no  RAM  while  a  conventional  receiver  would  need  at  least  220  RAM 
locations,  to  keep  track  of  trellis  paths.  This  is  but  a  crude  comparison  of  memory  require¬ 
ments,  since  the  accuracy  used  to  represent  weights,  filter  coefficients  and  path  metrics,  has 
been  disregarded,  but  it  shows  a  tradeoff  between  ROM  and  RAM.  This  tradeoff  is  important, 
since  RAM  can  take  up  50  percent  of  the  area  of  a  custom  chip  for  Viterbi  decoding. 

The  NN  architecture  has  a  number  of  other  advantages  compared  to  the  conventional 
receivers.  T.ie  first  is  the  regular  and  parallel izable  nature  of  NN,  which  one  might  easily  map 
to  homogeneous  architectures,  such  as  systolic  arrays,  which  are  amenable  to  VLSI  implemen¬ 
tations.  Indeed  the  Viterbi  algorithm  is  an  inherently  serial  algorithm  in  that  previous  decoded 
symbols  are  required  to  decode  the  present  and  future  ones.  Parallel  implementations  of  the 
Viterbi  algorithm,  based  on  block  partitioning  of  the  data,  require  a  synchronization  period  to 
estimate  the  initial  state  in  the  trellis  diagram,  before  decoding  can  begin.  This  is  in  fact  the 
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main  reason  for  overlapping  the  partitioned  blocks.  The  above-mentioned  synchronization 
period  can  potentially  limit  the  extent  to  which  the  algorithm  can  be  parallelized.  Our  proposed 
NN  implementation  has  no  feedback,  thus  by  using  more  than  one  network  in  parallel,  we  can 
make  the  demodulation  rate  arbitrarily  large.  A  second  advantage  lies  in  the  speed  of  demodu¬ 
lation.  Gearly  NN  receivers  would  be  faster  since  they  require  only  a  forward  pass,  while 
conventional  decoders  usually  backtrack  in  order  to  obtain  demodulated  symbols.  A  third 
advantage,  may  be  allowing  for  online  adaptation  to  the  noise  characteristics  of  the  channel, 
and  thus  providing  a  form  of  nonlinear  equalization.  A  fourth  advantage,  which  at  this  point  is 
mere  conjecture,  might  arise  from  the  a  favorable  scaling  of  the  NN  classifier  to  an  increase  in 
the  complexity  of  the  modulation  scheme.  This  is  particularly  important  considering  that  con¬ 
ventional  methods  scale  as  ML . 
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III-B.  Learning  and  Generalization  by  Neural  Networks 

Professor  A.  Sangiovanni-Vincentelli  with  Arlindo  Oliveira  and  Alan  Kramer 

In  this  work  unit  we  proposed  to  assess  whether  the  use  of  logic  synthesis  techniques 
could  be  used  in  the  specification  of  the  interconnection  patterns  for  neural  networks  architec¬ 
tures. 

The  development  of  a  special  purpose  logic  synthesizer,  targeted  for  the  problem  of  rule 
induction  from  examples  has  been  undertaken.  This  logic  synthesizer  generates  the  most  com¬ 
pact  two-level  network  of  threshold  gates  that  matches  a  given  input-output  specification.  The 
results  have  shown  that  a  formal  approach  to  the  problem  of  deriving  an  adequate  architecture 
for  neural  networks  is  not  only  feasible  but  highly  desirable. 

Apart  from  proving  that  this  approach  is  appropriate  for  the  derivation  of  appropriate 
rules  in  cases  when  a  compact  two-level  representation  is  known  to  exist,  this  research  has  led 
to  some  very  interesting  conclusions: 

1.  Two  level  representations  are  not  always  appropriate.  There  are  cases  where  either  com¬ 
pact  two-level  representations  do  not  exist  or,  although  they  do  exist,  they  do  not  gen¬ 
erate  an  adequate  induction  hypothesis.  Results  from  hand-written  character  recognition 
have  shown  that  although  a  one-gate  network  adequately  fits  the  training  set  (for  each 
character,  in  a  one  writter  database  we  tested  it  on)  the  results  on  the  test  set  arc  not  per¬ 
fect,  with  some  characters  being  left  unclassified.  These  results  and  other  related  ones 
have  shown  us  the  need  for  the  extension  of  the  techniques  used  so  far  to  the  synthesis  of 
multi-level  networks. 

2.  When  a  restriction  on  the  largest  possible  size  of  the  weights  in  a  given  network  is 
allowed,  an  algorithm  for  the  derivation  of  the  appropriate  connection  weights  can  be 
derived  from  logic  synthesis  techniques.  This  algorithm  runs  in  a  time  polynomial  in  the 
number  of  examples  and  the  maximum  weights  size  and  is,  therefore,  much  more 
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efficient  than  the  ones  currently  used  like  the  perccptron  rule  or  error  back-propagation.  A 
strong  convergence  theorem  was  derived  for  some  (somewhat  restricted)  conditions, 
although  work  on  the  relaxation  of  these  conditions  is  under  way.  This  limitation  on  the 
allowed  weight  sizes,  far  from  being  a  burden,  is  usually  an  advantage  if  physical  imple¬ 
mentation  of  the  network  is  desired.  In  fact,  implementation  of  the  large  weights  com¬ 
monly  required  by  alternative  techniques  is  usually  cumbersome  and,  in  some  cases, 
unfeasible.  Our  approach,  on  the  other  hand,  derives  networks  with  small  integer  weights, 
which  are  easier  to  implement 

In  this  work  unit  we  also  proposed  to  investigate  massively  parallel  analog  computation 
and  its  application  to  the  real-world  computational  task  of  handwritten  character  recognition. 

Close  collaboration  with  technology  groups  is  necessary  to  insure  that  the  computing 
architectures  we  are  proposing  are  feasible.  To  that  end  we  have  focused  on  the  use  of 
EEPROM  devices  as  our  basic  computing  structure.  We  are  currently  investigating  several 
algorithms  which  would  be  efficient  in  implementing  this  technology. 

The  first  of  these  is  a  standard  feedforward  neural  network  architecture  based  on  novel 
EEPROM-based  synapses.  Conventional  neural  networks  are  based  on  synapses  which  per¬ 
form  multiplicative  weighting.  Once  EEPROM  devices  are  used  for  weight  storage,  the  addi¬ 
tional  area  needed  to  perform  analog  multiplication  dominates  that  required  to  implement  a 
synapse.  We  have  discovered  a  class  of  highly  compact,  nonlinear  weighting  functions  based 
on  the  use  of  the  novel  EEPROM  1-V  characteristic.  This  approach  results  in  a  synapse  requir¬ 
ing  one  or  two  EEPROM  devices  and  promises  nearly  a  hundred-fold  increase  in  the  density  of 
neural  network  implementations. 

The  synapses  we  are  investigating  are  non-linear  and  thus  not  strictly  in  keeping  with  the 
mainstream  in  neural  network  learning.  We  have  developed  a  neural  network  simulator  to 
determine  the  usefulness  of  these  non-linear  synapses  when  applied  to  a  real-world  learning 
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task.  We  are  focusing  on  the  problem  of  handwritten  character  recognition  in  this  work.  Ini¬ 
tial  results  have  been  promising  but  more  investigation  is  needed. 

The  second  algorithm  which  we  are  investigating  is  the  nearest  neighbor  classification 
algorithm.  This  algorithm  has  existed  for  a  long  time  and,  given  enough  example  points,  has 
been  known  to  perform  well  on  vector-encoded  classification  tasks.  It  has  not  gained  wider 
use  because  computation  has  been  limited  to  a  sequential  digital  substrate  so  that  the  distance 
to  only  one  neighbor  could  be  measured  at  a  time.  Conventional  research  into  the  use  of  this 
algorithm  has  focused  on  the  development  of  k-d  trees  and  other  data  structures  to  cut  the 
nearest  neighbor  search  down  to  log  time,  as  well  as  ways  to  reduce  the  number  of  example 
points  needed  for  the  algorithm  to  perform  well. 

We  have  also  discovered  a  novel  EEPROM-based  architecture  for  performing  analog  dis¬ 
tance  computation  in  parallel.  See  under  the  Significant  Accomplishments  section  of  this  report 
for  a  discussion  of  this  development. 
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III-C.  Reconfigurable  Analog  Elements  for  Neural  Nets 
Professors  Ping  K.  Ko  and  Chcnming  Hu 

Investigation  of  using  EEPROM  devices  as  reconfigurable  analog  weights  continued  in 
the  past  year.  Two  full  CMOS-EEPROM  runs  were  completed,  which  demonstrated  our  abil¬ 
ity  to  fabricate  EEPROM-based  CMOS  circuits. 

The  first  run  finished  in  May  produced  EEPROM  devices  with  various  geometries  and 
floating-gate  structures  and  enabled  us  to  verify  a  drain  current  model  for  EEPROM  devices. 
The  model  is  needed  for  understanding  the  design  tradeoffs  among  the  various  substructures 
when  a  EEPROM  cell  is  used  for  analog  storage.  Based  on  the  modeling  exercise  and  experi¬ 
mental  results,  we  have  devised  a  promising  high-density  EEPROM  based  analog  synaspe. 
Other  process  and  device  information  we  collected  from  the  first  run  also  helped  us  get  the  last 
glitches  out  of  the  technology. 

The  second  set  of  12  wafers,  which  was  completed  in  December,  appears  to  be  fully 
functional  based  on  our  preliminary  test  results.  They  contain  several  varied  designs  of  the 
EEPROM-based  synaspe,  as  well  as  constant-charge -packet  injecting  circuits  based  on  charge¬ 
pumping  or  the  CCD  principle.  We  are  characterizing  these  structures  at  present. 

With  the  fabrication  technology  in  place,  we  are  working  with  several  neural  architecture 
groups  building  demonstrational  CMOS-EEPROM-based  neural  network  chips.  Ongoing 
chip  designs  include  a  Layered  Feedforward  Neural  Network  chip  and  a  Nearest  Neighbor 
Classification  chip. 

Research  has  begun  on  oxide  anti  fuse  as  a  programmable  weight  device.  The  eventual 
goal  is  to  have  a  small  device  suitable  for  neural  chips  containing  millions  of  synapses.  A 
mask  set  for  test  structures  have  been  designed  and  the  first  fabrication  run  has  been  com¬ 
pleted. 
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Oxide  antifuscs  consisting  of  70  angstroms  of  Si02  between  silicon  substrate  and  polysil¬ 
icon  gates  were  fabricated.  We  varied  the  doping  concentration  of  the  substrate  and  the 
polysilicon  gates.  In  some  structures,  A1  covers  the  antifuse  oxide  area  to  investigate  the  pos¬ 
sibility  of  forming  low-resistance  filaments  of  Si-Al  entectic.  Other  structures  insert  varying 
resistances  between  the  antifuse  and  the  probe  pad  capacitance  to  study  the  effect  of  the  dump¬ 
ing  of  the  capacitively  stored  charge  and  energy  into  the  antifuse  at  the  time  of  oxide  break¬ 
down. 

Linear  resistance  that  is  programmable  by  varying  the  program  current  was  observed  in 
antifuscs  involving  low-resistivity  materials.  Antifuses  made  with  high-resistivity  silicon 
showed  exponential  I-V  characteristics  after  programming.  We  are  in  the  process  of  character¬ 
izing  these  behaviors. 
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III-D.  Architectural  Issues  in  Parallel  Computation 

Heterogeneous  Architectures  for  ANN’s 
Professor  C.  H.  Sdquin  with  Chedsada  Chinrungrucng 

The  overall  objective  of  our  project  is  to  evaluate  the  heterogeneous  architectures  for 
artificial  neural  networks  (ANN’s).  Currently  we  are  investigating  a  hierarchical  ANN  archi¬ 
tecture  consisting  of  two  levels:  The  lower  level  is  an  unsupervised  competitive  learning  net¬ 
work  whose  task  is  to  divide  the  overall  task  into  several  simpler  subtasks.  The  upper  level  is 
a  collection  of  networks  where  each  one  is  trained  for  solving  one  of  the  above  subtasks. 

During  the  past  six  months,  we  have  investigated  competitive  learning  algorithms  that 
divide  a  given  task  based  on  a  partitioning  of  the  input  domain  on  which  the  task  is  defined. 
We  have  developed  a  new  competitive  learning  algorithm  which  is  a  modification  of  the  tradi¬ 
tional  adaptive  k-means  clustering  algorithm.  It  divides  the  input  domain  by  constructing  in  a 
continuous  on-line  manner  a  Voronoi  partition  around  a  specified  number  of  clustering  centers. 
This  new  algorithm  can  approximate  an  optimal  partitioning  solution  with  efficient  adaptive 
learning  rates,  which  renders  it  usable  even  in  situations  where  the  statistics  of  the  problem 
task  slowly  vary  with  time. 

These  capabilities  are  achieved  via  two  mechanisms:  The  first  mechanism  guides  the  par¬ 
tition  towards  an  optimal  solution  by  aiming  directly  at  minimizing  the  differences  between  the 
averaged  variations  of  each  cluster.  This  allows  the  new  algorithm  to  obtain  a  solution  closer 
to  the  optimal  value  than  other  competitive  learning  algorithms  in  the  same  class.  The  second 
mechanism  dynamically  adjusts  the  learning  rate  based  on  the  estimated  deviation  from  an 
equilibrium  state  where  all  clusters  have  an  equal  variation  of  the  average  number  of  input 
samples  that  they  have  to  cover.  This  allows  the  algorithm  to  leam  very  quickly  initially  or 
after  a  change  in  the  characteristics  of  the  problem  statement.  As  the  partition  approaches  an 
optimal  solution,  the  learning  rate  decreases,  which  in  turn  allows  the  partition  to  move  even 
closer  to  the  optimal  value.  Thus  the  algorithm  can  achieve  a  smaller  residual  deviation  from 
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an  optimal  value  than  other  competitive  learning  algorithms. 


Both  of  these  two  mechanisms  arc  based  on  the  necessary  condition  for  the  optimality  of 
the  k-mcans  partition,  stating  that:  all  of  the  regions  in  the  optimal  k-means  partition  have  the 
same  within-cluster  variation  when  the  number  of  regions  in  the  partition  is  large  and  the  pro¬ 
bability  distribution  generating  the  training  samples  is  smooth.  This  within-cluster  variation  of 
any  cluster  is  defined  as  the  sum  of  the  squared  Euclidean  distance  between  the  pattern  vectors 
in  that  cluster  and  the  center  of  the  cluster. 
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Fault  Tolerance  in  Layered  ANN’s 
Professor  C.  H.  Sdquin  with  Reed  Clay 

Our  research  has  been  concerned  with  the  errors  resulting  from  defective  units  and  faulty 
weights  in  layered  feed-forward  ANN’s.  We  have  explored  and  analyzed  techniques  to  make 
these  networks  more  robust  against  such  failures.  First,  using  some  simple  examples  of  pattern 
classification  tasks  and  of  analog  function  approximation,  we  have  demonstrated  that  standard 
architectures  subjected  to  normal  backpropagation  training  techniques  do  not  lead  to  any 
noteworthy  fault  tolerance.  Additional,  redundant  hardware  coupled  with  suitable  new  training 
techniques  are  necessary  to  achieve  that  goal.  A  simple  and  general  procedure  has  been  found 
that  develops  fault  tolerance  in  neural  networks:  Failures  of  the  type  that  one  might  expect  to 
occur  during  operation  are  introduced  at  random  during  the  training  of  the  network,  and  the 
resulting  output  errors  are  used  in  a  standard  way  for  backpropagation  and  weight  adjustment 
The  result  of  this  training  method  is  a  modified  internal  representation  that  is  not  only  more 
robust  to  the  type  of  failures  encountered  in  training,  but  which  is  also  more  tolerant  of  faults 
for  which  the  network  has  not  been  explicitly  trained. 
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In  the  context  of  a  discrete  classification  task,  we  have  demonstrated  that  simple  training 
with  backpropagation  in  the  presence  of  various  hidden  unit  failures  can  lead  to  fault  tolerance 
with  respect  to  single  or  multiple  faults.  Similar  robustness  can  be  achieved  by  training  with 
multiple  failures  or  by  prolonged  training  with  single  failures.  Failures  in  the  input  units  are 
equivalent  to  noise  on  the  input  patterns.  While  training  the  network  with  hidden  unit  failures 
cannot  render  it  robust  against  input  failures,  training  with  input  noise  can  lead  to  some  degree 
of  fault  tolerance  also  with  respect  to  failures  in  the  hidden  layer. 

In  the  context  of  analog  function  approximation  tasks,  we  have  discovered  a  promising 
approach  that  achieves  fault  tolerance  by  tightly  controlling  the  fractional  contribution  that  each 
hidden  unit  makes  to  the  linearly  summed  output  value.  We  first  make  the  observation  that  the 
worst  case  output  errors  that  can  be  produced  by  the  failure  of  a  hidden  neuron  can  be  much 
worse  than  simply  the  loss  of  the  contribution  of  a  neuron  whose  output  goes  to  zero.  A  much 
larger  erroneous  signal  can  be  produced  when  the  failure  drives  a  hidden  neuron  into  satura¬ 
tion,  i.e.,  sets  its  output  value  to  one  of  the  power  supply  voltages. 

To  counter  this  problem,  we  have  investigated  a  new  method  that  limits  the  fractional 
error  in  the  output  signal  of  a  feed-forward  net  due  to  such  saturated  hidden  unit  faults  in  ana¬ 
log  function  approximation  tasks.  The  number  of  hidden  units  is  significantly  increased,  and 
the  maximal  contribution  of  each  unit  is  limited  to  a  small  fraction  of  the  net  output  signal.  To 
achieve  a  large  localized  output  signal,  several  Gaussian  hidden  units  are  moved  into  the  same 
location  in  the  input  domain  and  the  gain  of  the  linear  summing  output  unit  is  suitably 
adjusted.  Since  the  contribution  of  each  unit  is  equal  in  magnitude,  there  is  only  a  modest 
error  under  any  possible  failure  mode. 

We  have  also  explored  a  hierarchical  approach  of  building  ANN’s  to  obtain  more  general 
fault  tolerance  to  within  a  higher  degree  of  accuracy  for  analog  data  approximation  tasks. 
Several  redundant  subnets  are  used  to  perform  the  same  approximation  task  in  parallel,  and  a 
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supervisory  circuit  combines  their  outputs  by  eliminating  the  signals  that  fall  outside  some  mar¬ 
gin  and  by  averaging  the  other  subnets  outputs. 
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Parallel  Computing  Network  and  Program 

Professor  Abhiram  Ranade  with  M.T.  Raghunath  and  Robert  Boothe 

The  goal  of  our  research  is  to  develop  high  performance,  cost  effective  networks  for 
interconnecting  processors.  We  would  like  these  networks  to  be  suitable  for  running  a  wide 
variety  of  applications  including  neural  network  simulation  and  circuit  simulation.  The  work 
has  2  main  parts:  Network  Design  and  Understanding  Parallel  Program  Behavior. 

While  network  design  depends  upon  many  factors,  our  primary  concern  is  packaging 
technology.  Large  networks  need  to  be  partitioned  and  packaged  in  a  hierarchical  manner  into 
chips,  boards,  and  racks.  Interconnections  at  the  lower  levels  of  the  hierarchy  will  be  cheaper 
and  faster.  We  plan  to  explore  hybrid  designs  in  which  the  lower  levels  of  the  hierarchy  use  a 
denser  network  than  the  one  used  at  the  higher  levels.  While  it  is  obvious  that  this  will 
improve  local  communication  performance  (e.g.  within  a  board),  we  believe  it  will  also  reduce 
the  latency  for  long  distance  communication  (e.g.  between  racks). 

We  are  currently  developing  simulators  for  different  network  architectures.  We  evaluate 
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network  performance  by  simulating  the  execution  of  a  synthetic  workload  which  approximates 
the  execution  of  real  programs.  We  simulated  multistage  interconnection  networks  to  estimate 
the  performance  improvements  obtained  by  changing  the  radix,  dilation,  routing  algorithms, 
etc.  Results  are  summarized  in  a  paper  presented  at  the  second  IEEE  Symposium  on  Parallel 
and  Distributed  Processing  [1],  We  are  currently  evaluating  the  performance  of  the  recently- 
proposed  multibutterfly  network  [2],  In  comparison  to  the  butterfly  network,  the  interconnec¬ 
tion  pattern  of  the  multibutterfly  network  is  more  complex  but  it  is  capable  of  achieving  better 
latency  and  throughput.  Our  current  simulations  attempt  to  quantify  this  improvement  in  per¬ 
formance.  Based  on  these  results  and  results  from  simulations  of  other  networks,  we  will 
evaluate  the  appropriate  hybrid  networks. 

In  order  to  reduce  the  number  of  detailed  simulations  that  need  to  be  carried  out,  we  arc 
defining  analytical  transformations  that  help  us  to  extrapolate  the  results  obtained  under  one  set 
of  simulation  parameters  to  other  sets  of  parameters. 

Long  memory  (or  communication)  latency  is  an  inevitable  feature  of  distributed  multipro¬ 
cessors.  Many  latency  tolerance  and  avoidance  techniques  have  been  proposed,  and  it  is  desir¬ 
able  to  evaluate  them.  We  are  developing  a  simulator  that  will  enable  us  to  do  this,  and  in 
general  aid  in  understanding  the  communication  behavior  of  parallel  programs.  Our  simulator 
works  at  the  instruction  level,  and  can  model  long  memory  latencies  that  can  arise  in  typical 
multiprocessor  networks.  The  simulator  is  nearly  completed. 

Simulating  computers  has  historically  been  notoriously  slow.  Typical  parallel  machine 
simulators  at  the  instruction  level  slow  down  by  a  factor  of  1,000  to  2,000.  This  large  slow 
down  limits  the  size  of  programs  and  machines  that  can  be  simulated.  We  have  developed  an 
innovative  simulator  technology  that,  according  to  preliminary  measurements,  allows  us  to 
reduce  the  slow  down  factor  to  50-100. 
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This  tremendous  speed-up  has  been  made  possible  by  converting  what  is  usually  an  inter¬ 
pretive  process  into  direct  execution.  This  is  analogous  to  compilation.  The  program  to  be 
simulated  is  first  compiled  normally  and  then  analyzed  extensively  and  modified  to  interact 
with  the  simulator  only  at  key  points.  Most  instructions  execute  directly.  Only  shared 
memory  instructions  call  the  simulator.  In  a  typical  program  more  than  90%  of  the  instructions 
can  be  executed  directly,  taking  only  a  single  cycle.  These  single  cycle  instruction  amortize 
the  cost  of  the  remaining  simulated  instructioas. 

Once  the  simulator  is  completed,  we  will  begin  taking  measurements  of  and  gathering 
statistics  on  the  execution  behavior  of  large  shared  memory  parallel  programs.  Such  statistics 
are  desperately  needed  by  researchers  investigating  parallel  computers. 
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